Skip to content

U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns)#30

Merged
koko1123 merged 4 commits intomainfrom
fix/u256-benchmark-fairness
Mar 4, 2026
Merged

U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns)#30
koko1123 merged 4 commits intomainfrom
fix/u256-benchmark-fairness

Conversation

@koko1123
Copy link
Contributor

@koko1123 koko1123 commented Mar 3, 2026

Summary

Closes the u256 compound arithmetic performance gap with Rust/alloy. The uniswapv2_naive benchmark went from 87ns (3.48x slower than Rust) to 20ns (1.20x faster).

Key Optimizations

  • Scalar multiply (mulLimbScalar): Multiplying [4]u64 limbs by a single u64 uses 4 mul/umulh pairs instead of 10 for full 4x4 schoolbook. Used for fee constants (997, 1000) in UniswapV2.

  • Specialized 128-by-128 division (div2by2): Inline register-only path for the common case where both numerator and divisor fit in 2 limbs. Avoids Knuth D's runtime-loop array overhead and eliminates __udivti3 software division calls (~50ns penalty discovered via disassembly).

  • Limb-native benchmark: Rewrote uniswapv2_naive to use [4]u64 limb operations directly — apples-to-apples comparison with Rust's [u64; 4] U256 type.

Other Changes

  • Fixed Rust benchmark: checked_div -> / (removed unfair Option handling overhead)
  • Made limb functions pub inline (mulLimbs, addLimbs, mulLimbScalar)
  • Used @bitCast for zero-cost u256 <-> [4]u64 conversion
  • Added *.a, *.o, *.s to .gitignore (build artifacts)

Benchmark Results

Benchmark                       eth-zig   alloy.rs          Result
---------------------------- ---------- ---------- ----------------
u256_add                           3 ns       3 ns              tie
u256_mul_small                     3 ns       2 ns         rs 1.50x
u256_mul_full                      3 ns       4 ns        zig 1.33x
u256_div_small                     6 ns      17 ns        zig 2.83x
u256_div_full                     44 ns      27 ns         rs 1.63x
u256_uniswapv2_naive              20 ns      24 ns        zig 1.20x
u256_mulDiv                       38 ns      29 ns         rs 1.31x
u256_uniswapv4_swap               45 ns      45 ns              tie

Score: eth-zig 3/8 | alloy.rs 3/8 | tied 2/8

Previous score: eth-zig 2/8 | alloy.rs 4/8 | tied 2/8

Test plan

  • zig build test passes
  • zig build bench-u256 runs successfully
  • bash bench/compare_u256.sh shows improved results
  • uniswapv2_naive: 87ns -> 20ns (4.35x speedup, now beats Rust)

Summary by CodeRabbit

  • Tests

    • Added comprehensive U256 benchmark suites (Rust and Zig) and an apples‑to‑apples comparison script; integrated criterion-style harnesses for consistent timing.
  • Chores

    • Added a Zig benchmark target to the build and updated ignore patterns for build artifacts.
  • Refactor

    • Optimized and inlined core 256‑bit arithmetic paths to improve benchmark performance.
  • Documentation

    • Updated README and benchmark results with revised measurements, claims, and comparison summaries.

@vercel
Copy link

vercel bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eth-zig Ready Ready Preview, Comment Mar 4, 2026 2:01pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Adds a U256 benchmarking suite and comparison tooling across Zig and Rust; introduces a new Rust bench, a Zig u256 benchmark executable, comparison scripts, and performance-focused changes plus visibility/inline adjustments in the Zig u256 implementation.

Changes

Cohort / File(s) Summary
Benchmark Config & Cargo
bench/alloy-bench/Cargo.toml
Adds a second [[bench]] entry named u256_comparison (harness = false).
Alloy.rs Bench Files
bench/alloy-bench/benches/u256_comparison.rs, bench/alloy-bench/benches/eth_comparison.rs
Adds a new Rust U256 benchmark suite and simplifies sqrt_price construction in the existing bench to use a direct u128 constant (Q96).
Zig Bench Harness & Build
bench/u256_bench.zig, build.zig, build.zig.zon
Adds a Zig Criterion-style u256 benchmark executable, wires a new u256-bench build/run step, and removes the zbench dependency from the manifest.
Comparison Orchestration
bench/compare_u256.sh, bench/compare.sh
Adds a bash orchestrator to run Zig and Rust benches, parse and normalize outputs (embedded Python), and print side-by-side comparisons; updates parsing behavior for Zig output.
Zig u256 Implementation
src/uint256.zig
Large changes: expose/inline limb helpers (u256ToLimbs, limbsToU256, mulLimbs, addLimbs, fastMul, fastDiv, divLimbsDirect), add mulLimbScalar and div2by2, and route getAmountOut to scalar limb path with a 2-limb division optimization.
Zig Bench Changes
bench/bench.zig, bench/keccak_compare.zig, bench/u256_bench.zig
Convert allocator-taking bench functions to allocator-free signatures and add an internal benchmark harness (warmup, dynamic batching, timed runs) with print/JSON reporting.
Auxiliary / Docs / Results
README.md, bench/RESULTS.md
Update README marketing and benchmark claims; update RESULTS.md with new per-benchmark outcomes and harness details.
Misc / Gitignore
.gitignore
Add ignore patterns for build artifacts (*.a, *.o, *.s).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Orchestrator as compare_u256.sh
    participant ZigBench as eth.zig (u256_bench)
    participant RustBench as alloy.rs (u256_comparison)
    participant Parser as Python_Parser
    participant Output as Results_Table

    User->>Orchestrator: run compare_u256.sh
    Orchestrator->>ZigBench: build & run zig bench-u256
    ZigBench->>ZigBench: execute u256 benchmarks
    ZigBench-->>Orchestrator: bench output (text/JSON)
    Orchestrator->>RustBench: cargo bench --bench u256_comparison
    RustBench->>RustBench: execute criterion benches
    RustBench-->>Orchestrator: criterion output
    Orchestrator->>Parser: feed both outputs
    Parser->>Parser: extract ns/op, normalize names/units
    Parser-->>Orchestrator: structured timings
    Orchestrator->>Output: render comparison table & summary
    Output-->>User: display results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

Poem

🐰
I hopped through limbs of bits and bytes,
Counting carries through sleepless nights,
Zig and Rust raced, carrot in view,
Timings tallied — one, two, woo!
Benchmarks bloom where rabbits chew. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main optimization: U256 limb-native arithmetic improvements that achieve a 4.35× speedup (87ns to 20ns) on UniswapV2, which aligns with the PR's primary objective of beating Rust performance on this benchmark.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/u256-benchmark-fairness

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/alloy-bench/benches/u256_comparison.rs`:
- Around line 78-84: The div benchmarks currently call U256::checked_div (in the
bench function names "div_small" and "div_full"), which adds Option-handling
overhead; replace those calls with the raw division operator (use
black_box(large) / black_box(ONE_ETH) and black_box(large) / black_box(FULL_C)
respectively) since ONE_ETH and FULL_C are non-zero constants, and update both
the div_small and div_full closures to perform direct division with / instead of
checked_div to make the Rust benchmarks comparable to Zig's fastDiv.

In `@bench/compare_u256.sh`:
- Around line 20-34: The script currently captures full benchmark logs into
shell variables ZIG_OUTPUT and RUST_OUTPUT and passes them as argv to the
embedded Python (python3 - "$ZIG_OUTPUT" "$RUST_OUTPUT"), which can hit
argument-length limits; instead write both outputs to temporary files (e.g.,
create secure temp files), pass the temp file paths or read them inside the
Python here-doc, and clean up the temps afterward; update references around
ZIG_OUTPUT, RUST_OUTPUT and the python3 invocation so the Python script reads
from those temp files (or from stdin via a single here-doc) rather than
receiving huge logs via argv, and apply the same change for the second
occurrence at lines 39-40.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7fdc66 and 8eeced0.

📒 Files selected for processing (6)
  • bench/alloy-bench/Cargo.toml
  • bench/alloy-bench/benches/eth_comparison.rs
  • bench/alloy-bench/benches/u256_comparison.rs
  • bench/compare_u256.sh
  • bench/u256_bench.zig
  • build.zig

Three key optimizations to close the u256 compound arithmetic gap:

1. Scalar multiply (mulLimbScalar): 4 mul/umulh pairs instead of 10
   for single-limb multipliers like fee constants (997, 1000)

2. Specialized 128-by-128 division (div2by2): inline register-only
   division for 2-limb operands, avoiding Knuth D's array overhead
   and __udivti3 software division (~50ns penalty)

3. Limb-native benchmark: rewrite uniswapv2_naive to use [4]u64 limb
   operations directly (apples-to-apples with Rust's [u64; 4] U256)

Also: fix Rust benchmark using checked_div instead of / operator,
gitignore build artifacts (*.a, *.o, *.s), make limb functions pub
inline, use @bitcast for u256<->limbs conversion.

Score: eth-zig 3/8 | alloy.rs 3/8 | tied 2/8 (was 2/8 | 4/8 | 2/8)
@koko1123 koko1123 changed the title Fix u256 benchmark fairness: correct sqrt_price and add dedicated u256 comparison U256 limb-native arithmetic: beat Rust on UniswapV2 (87ns -> 20ns) Mar 3, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
bench/u256_bench.zig (2)

47-52: Add an overflow guard while doubling batch.

batch *= 2 has no cap. A simple max check avoids potential wraparound/infinite-loop behavior in calibration edge cases.

Small robustness patch
     var batch: u64 = 64;
     while (true) {
         timer.reset();
         for (0..batch) |_| func();
         if (timer.read() >= 100_000_000) break; // 100ms
+        if (batch > std.math.maxInt(u64) / 2) break;
         batch *= 2;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/u256_bench.zig` around lines 47 - 52, In the calibration loop that
calls timer.reset(); for (0..batch) |_| func(); if (timer.read() >= 100_000_000)
break; batch *= 2; add an overflow guard so doubling batch cannot wrap: before
multiplying check batch against a safe maximum (e.g., compare to (usize.max / 2)
or use a constant MAX_BATCH) and either clamp batch to that max or break out of
the loop if it would overflow; update the code around the while (true) loop so
batch is only doubled when safe, preventing wraparound/infinite-loop behavior.

196-217: Run each benchmark once and reuse the result for both outputs.

Right now every case is measured twice (table + JSON). That can introduce drift between reported values and unnecessarily extends run time. Capture one BenchResult per case and print both formats from it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/u256_bench.zig` around lines 196 - 217, Currently each benchmark
(benchAdd, benchMulSmall, etc.) is executed twice via runAndPrint and
runAndJson; change the flow so each benchmark is executed once to produce a
single BenchResult and then print both formats from that result. Concretely: add
or use a function that runs a benchmark and returns a BenchResult (e.g.,
runBenchmark or make runAndPrint return BenchResult), call it once for each
symbol (benchAdd, benchMulSmall, benchMulFull, benchDivSmall, benchDivFull,
benchUniswapV2Naive, benchUniswapV2Optimized, benchMulDiv, benchUniswapV4Swap)
storing the result, then pass that BenchResult into printing helpers (create or
adapt runAndPrint/runAndJson to accept a BenchResult and stdout) so stdout.print
is unchanged but each case is measured only once.
.gitignore (1)

8-11: Avoid globally ignoring assembly sources.

Line 11 (*.s) can accidentally hide new checked-in assembly source files. If the goal is only generated artifacts, keep this scoped to build output paths (or drop *.s).

Proposed tweak
 # Build artifacts
 *.a
 *.o
-*.s
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 8 - 11, The .gitignore entry "*.s" is too broad and
can hide legitimate checked-in assembly sources; remove or narrow it: either
delete the "*.s" pattern or scope it to generated build output paths (e.g.,
replace "*.s" with something like "build/*.s" or "out/**/*.s" or add the pattern
under a build/ or obj/ directory), ensuring only generated assembly artifacts
are ignored while leaving source .s files trackable; update the .gitignore
accordingly where the "*.s" pattern appears.
src/uint256.zig (2)

99-105: Document limb ordering for public conversion helpers.

Now that u256ToLimbs and limbsToU256 are public, please document the exact limb order (least-significant to most-significant) to avoid cross-module misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/uint256.zig` around lines 99 - 105, The public conversion helpers
u256ToLimbs and limbsToU256 lack documentation about limb ordering; update both
function doc comments to explicitly state that the returned/accepted [4]u64
array is ordered from least-significant limb at index 0 to most-significant limb
at index 3 (i.e., little-endian limb order), and mention that callers must use
this ordering when converting between u256 and limbs to avoid cross-module
misuse; add the same clarification to any related public API docs or comments
referencing these helpers.

387-390: Add a direct test for the new 2-limb fast path.

Line 387 introduces a dedicated dd == 2 and nn <= 2 branch, but there isn’t a focused test that guarantees this path remains correct across refactors.

Suggested test addition
+test "fastDiv 2-limb numerator 2-limb divisor fast path" {
+    const n: u256 = (`@as`(u256, 1) << 127) + 123456789;
+    const d: u256 = (`@as`(u256, 1) << 96) + 7;
+    try std.testing.expectEqual(n / d, fastDiv(n, d));
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/uint256.zig` around lines 387 - 390, Add a unit test that directly
exercises the new 2-limb fast path by invoking the division routine with dd == 2
and nn <= 2 so the branch with div2by2 is taken; create tests that call the
public division/quotient function (or the tested function that reads
numerator/denominator limbs) using two-limb divisors and one- and two-limb
numerators, covering typical, edge and random cases (e.g., small values, max
limb values, and boundary carries) and assert the quotient/remainder match a
reference implementation (big-int or existing slow-path division). Ensure the
test targets the symbols involved (div2by2 and the surrounding division
function) so future refactors will fail if the fast path is incorrect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.gitignore:
- Around line 8-11: The .gitignore entry "*.s" is too broad and can hide
legitimate checked-in assembly sources; remove or narrow it: either delete the
"*.s" pattern or scope it to generated build output paths (e.g., replace "*.s"
with something like "build/*.s" or "out/**/*.s" or add the pattern under a
build/ or obj/ directory), ensuring only generated assembly artifacts are
ignored while leaving source .s files trackable; update the .gitignore
accordingly where the "*.s" pattern appears.

In `@bench/u256_bench.zig`:
- Around line 47-52: In the calibration loop that calls timer.reset(); for
(0..batch) |_| func(); if (timer.read() >= 100_000_000) break; batch *= 2; add
an overflow guard so doubling batch cannot wrap: before multiplying check batch
against a safe maximum (e.g., compare to (usize.max / 2) or use a constant
MAX_BATCH) and either clamp batch to that max or break out of the loop if it
would overflow; update the code around the while (true) loop so batch is only
doubled when safe, preventing wraparound/infinite-loop behavior.
- Around line 196-217: Currently each benchmark (benchAdd, benchMulSmall, etc.)
is executed twice via runAndPrint and runAndJson; change the flow so each
benchmark is executed once to produce a single BenchResult and then print both
formats from that result. Concretely: add or use a function that runs a
benchmark and returns a BenchResult (e.g., runBenchmark or make runAndPrint
return BenchResult), call it once for each symbol (benchAdd, benchMulSmall,
benchMulFull, benchDivSmall, benchDivFull, benchUniswapV2Naive,
benchUniswapV2Optimized, benchMulDiv, benchUniswapV4Swap) storing the result,
then pass that BenchResult into printing helpers (create or adapt
runAndPrint/runAndJson to accept a BenchResult and stdout) so stdout.print is
unchanged but each case is measured only once.

In `@src/uint256.zig`:
- Around line 99-105: The public conversion helpers u256ToLimbs and limbsToU256
lack documentation about limb ordering; update both function doc comments to
explicitly state that the returned/accepted [4]u64 array is ordered from
least-significant limb at index 0 to most-significant limb at index 3 (i.e.,
little-endian limb order), and mention that callers must use this ordering when
converting between u256 and limbs to avoid cross-module misuse; add the same
clarification to any related public API docs or comments referencing these
helpers.
- Around line 387-390: Add a unit test that directly exercises the new 2-limb
fast path by invoking the division routine with dd == 2 and nn <= 2 so the
branch with div2by2 is taken; create tests that call the public
division/quotient function (or the tested function that reads
numerator/denominator limbs) using two-limb divisors and one- and two-limb
numerators, covering typical, edge and random cases (e.g., small values, max
limb values, and boundary carries) and assert the quotient/remainder match a
reference implementation (big-int or existing slow-path division). Ensure the
test targets the symbols involved (div2by2 and the surrounding division
function) so future refactors will fail if the fast path is incorrect.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8eeced0 and 7d0573c.

📒 Files selected for processing (4)
  • .gitignore
  • bench/alloy-bench/benches/u256_comparison.rs
  • bench/u256_bench.zig
  • src/uint256.zig
🚧 Files skipped from review as they are similar to previous changes (1)
  • bench/alloy-bench/benches/u256_comparison.rs

Replace zbench with a custom benchmark harness (0.5s warmup, calibrated
batch sizes, 2s measurement window). zbench's ~42ns macOS timer floor was
masking real performance -- u256_add measured 25ns under zbench, actual is
2ns.

With accurate measurement: eth.zig wins 20/26 benchmarks vs alloy.rs
(was 14/26 under zbench). Key corrections:
- u256_add: 25ns -> 2ns (ties alloy)
- u256_mul: 26ns -> 2ns (2.50x faster)
- u256_div: 28ns -> 3ns (4.00x faster)
- abi_decode_dynamic: 40ns -> 17ns (7.94x faster)
- hex_encode: 36ns -> 11ns (1.09x faster, was a loss)
- uniswapv2: 30ns -> 10ns (1.30x faster, was a loss)

Also: remove zbench dependency from build.zig.zon, port
keccak_compare.zig to custom harness, use FixedBufferAllocator for
allocating benchmarks, update README and RESULTS.md.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
bench/bench.zig (2)

48-99: Extract shared bench harness to avoid cross-file drift.

runBench/runAndPrint logic is now repeated across benchmark entrypoints. Centralizing these helpers into one module will reduce maintenance risk when tuning warmup/calibration behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/bench.zig` around lines 48 - 99, The bench harness (runBench and
runAndPrint) is duplicated across benchmarks; extract these functions into a
single shared module (e.g., bench/harness.zig) and update each benchmark
entrypoint to import that module and call the shared runBench/runAndPrint.
Preserve the functions' signatures (comptime func: fn() void and
runAndPrint(name: []const u8, func: fn() void, stdout: anytype) !void), retain
Timer usage and constants (WARMUP_NS, BENCH_NS) in the shared module, and ensure
benchmarks replace their local implementations with calls to the imported
functions.

175-449: Factor out repeated FixedBufferAllocator setup in benchmark functions.

The same local allocator scaffolding is copied across many benches. A tiny helper wrapper would reduce copy/paste surface and keep per-bench logic focused on the operation under test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/bench.zig` around lines 175 - 449, Many benchmarks duplicate the
FixedBufferAllocator setup (init of std.heap.FixedBufferAllocator, getting
alloc) — create a small helper (e.g., withFixedBufferAllocator or
makeFixedAllocator) that encapsulates the buffer allocation and returns the
allocator (or takes a callback that receives alloc) and replace the repeated
blocks in functions like benchAbiEncodeTransfer, benchAbiEncodeStatic,
benchAbiEncodeDynamic, benchRlpEncodeTx, benchTxHashEip1559, benchEip712Hash,
etc.; update those functions to call the helper for an allocator instead of
repeating the buf/fba/alloc boilerplate so each bench only contains the
test-specific logic.
bench/RESULTS.md (1)

7-7: Include exact toolchain/platform versions for reproducibility.

Adding CPU model/binning, macOS version, Zig version, and Rust toolchain version will make these benchmark deltas easier to validate over time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/RESULTS.md` at line 7, Update the "Benchmarks run on Apple Silicon..."
line in RESULTS.md to include explicit toolchain and platform versions for
reproducibility: append CPU model/binning (e.g., M1/M2 with specific SKU), macOS
version, Zig version, Rust toolchain (cargo/rustc) and target triple, and any
compiler flags used (ReleaseFast/--release), so the existing Benchmarks run on
Apple Silicon... sentence contains exact CPU, macOS, Zig and Rust versions and
flags.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/RESULTS.md`:
- Line 57: Update the table row containing "Custom criterion-style harness" to
avoid overstating precision: replace "Accurate sub-ns measurement; zbench had
~25ns floor on macOS" with wording like "Accurate sub-25ns regime; zbench had
~25ns floor on macOS" (or similar phrasing) to reflect reported integer ns
outputs rather than claiming true sub-nanosecond precision.

---

Nitpick comments:
In `@bench/bench.zig`:
- Around line 48-99: The bench harness (runBench and runAndPrint) is duplicated
across benchmarks; extract these functions into a single shared module (e.g.,
bench/harness.zig) and update each benchmark entrypoint to import that module
and call the shared runBench/runAndPrint. Preserve the functions' signatures
(comptime func: fn() void and runAndPrint(name: []const u8, func: fn() void,
stdout: anytype) !void), retain Timer usage and constants (WARMUP_NS, BENCH_NS)
in the shared module, and ensure benchmarks replace their local implementations
with calls to the imported functions.
- Around line 175-449: Many benchmarks duplicate the FixedBufferAllocator setup
(init of std.heap.FixedBufferAllocator, getting alloc) — create a small helper
(e.g., withFixedBufferAllocator or makeFixedAllocator) that encapsulates the
buffer allocation and returns the allocator (or takes a callback that receives
alloc) and replace the repeated blocks in functions like benchAbiEncodeTransfer,
benchAbiEncodeStatic, benchAbiEncodeDynamic, benchRlpEncodeTx,
benchTxHashEip1559, benchEip712Hash, etc.; update those functions to call the
helper for an allocator instead of repeating the buf/fba/alloc boilerplate so
each bench only contains the test-specific logic.

In `@bench/RESULTS.md`:
- Line 7: Update the "Benchmarks run on Apple Silicon..." line in RESULTS.md to
include explicit toolchain and platform versions for reproducibility: append CPU
model/binning (e.g., M1/M2 with specific SKU), macOS version, Zig version, Rust
toolchain (cargo/rustc) and target triple, and any compiler flags used
(ReleaseFast/--release), so the existing Benchmarks run on Apple Silicon...
sentence contains exact CPU, macOS, Zig and Rust versions and flags.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d0573c and 98b5c71.

📒 Files selected for processing (7)
  • README.md
  • bench/RESULTS.md
  • bench/bench.zig
  • bench/compare.sh
  • bench/keccak_compare.zig
  • build.zig
  • build.zig.zon
✅ Files skipped from review due to trivial changes (1)
  • README.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bench/RESULTS.md (1)

78-78: ⚠️ Potential issue | 🟡 Minor

Reproduction command is incomplete for published results.

Line 78 shows cargo bench --bench eth_comparison, but the results table includes u256_mulDiv (row 5), which is only produced by the separate u256_comparison bench target. To fully reproduce the published numbers, run both bench targets:

(cd bench/alloy-bench && cargo bench --bench eth_comparison && cargo bench --bench u256_comparison)

Alternatively, reference the existing bash bench/compare_u256.sh that already runs the u256 benchmarks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/RESULTS.md` at line 78, The reproduction command in RESULTS.md only
runs the eth_comparison bench but the published table includes results from the
u256_comparison target (e.g., u256_mulDiv), so update the command to run both
bench targets or point to the existing script; specifically change the command
that runs `cargo bench --bench eth_comparison` to run `cargo bench --bench
eth_comparison && cargo bench --bench u256_comparison` or replace it
with/mention the existing `bench/compare_u256.sh` script so `u256_comparison`
results are produced when reproducing the published numbers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@bench/RESULTS.md`:
- Line 78: The reproduction command in RESULTS.md only runs the eth_comparison
bench but the published table includes results from the u256_comparison target
(e.g., u256_mulDiv), so update the command to run both bench targets or point to
the existing script; specifically change the command that runs `cargo bench
--bench eth_comparison` to run `cargo bench --bench eth_comparison && cargo
bench --bench u256_comparison` or replace it with/mention the existing
`bench/compare_u256.sh` script so `u256_comparison` results are produced when
reproducing the published numbers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f42096b-6187-402b-ba13-4cea1c4b0ada

📥 Commits

Reviewing files that changed from the base of the PR and between 98b5c71 and f88277e.

📒 Files selected for processing (1)
  • bench/RESULTS.md

@koko1123 koko1123 merged commit 6c09375 into main Mar 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant